Goto

Collaborating Authors

 geometric data


Geometric Algebra Transformer

Neural Information Processing Systems

Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a Transformer, GATr is versatile, efficient, and scalable. We demonstrate GATr in problems from n-body modeling to wall-shear-stress estimation on large arterial meshes to robotic motion planning. GATr consistently outperforms both non-geometric and equivariant baselines in terms of error, data efficiency, and scalability.


GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language

Zhang, Yuhao, Hu, Dingxin, Yu, Tinghao, Liu, Hao, Liu, Yiting

arXiv.org Artificial Intelligence

Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry for their capabilities in handling multi-modal tasks. However, these models face challenges in mathematical geometric reasoning due to the scarcity of high-quality geometric data. To address this issue, synthetic geometric data has become an essential strategy. Current methods for generating synthetic geometric data involve rephrasing or expanding existing problems and utilizing predefined rules and templates to create geometric images and problems. However, these approaches often produce data that lacks diversity or is prone to noise. Additionally, the geometric images synthesized by existing methods tend to exhibit limited variation and deviate significantly from authentic geometric diagrams. To overcome these limitations, we propose GeoFM, a novel method for synthesizing geometric data. GeoFM uses formal languages to explore combinations of conditions within metric space, generating high-fidelity geometric problems that differ from the originals while ensuring correctness through a symbolic engine. Experimental results show that our synthetic data significantly outperforms existing methods. The model trained with our data surpass the proprietary GPT-4o model by 18.7\% on geometry problem-solving tasks in MathVista and by 16.5\% on GeoQA. Additionally, it exceeds the performance of a leading open-source model by 5.7\% on MathVista and by 2.7\% on GeoQA.


Revisiting Transformation Invariant Geometric Deep Learning: An Initial Representation Perspective

Zhang, Ziwei, Wang, Xin, Zhang, Zeyang, Cui, Peng, Zhu, Wenwu

arXiv.org Artificial Intelligence

Deep neural networks have achieved great success in the last decade. When designing neural networks to handle the ubiquitous geometric data such as point clouds and graphs, it is critical that the model can maintain invariance towards various transformations such as translation, rotation, and scaling. Most existing graph neural network (GNN) approaches can only maintain permutation-invariance, failing to guarantee invariance with respect to other transformations. Besides GNNs, other works design sophisticated transformation-invariant layers, which are computationally expensive and difficult to be extended. In this paper, we revisit why general neural networks cannot maintain transformation invariance. Our findings show that transformation-invariant and distance-preserving initial point representations are sufficient to achieve transformation invariance rather than needing sophisticated neural layer designs. Motivated by these findings, we propose Transformation Invariant Neural Networks (TinvNN), a straightforward and general plug-in for geometric data. Specifically, we realize transformation invariant and distance-preserving initial point representations by modifying multi-dimensional scaling and feed the representations into existing neural networks. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks. Extensive experimental results on point cloud analysis and combinatorial optimization demonstrate the effectiveness and general applicability of our method. We also extend our method into equivariance cases. Based on the results, we advocate that TinvNN should be considered as an essential baseline for further studies of transformation-invariant geometric deep learning.


Geometric Algebra Transformer

Neural Information Processing Systems

Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space.


Geometric Algebra Transformer

Neural Information Processing Systems

Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space.


STREAM: A Universal State-Space Model for Sparse Geometric Data

Schöne, Mark, Bhisikar, Yash, Bania, Karan, Nazeer, Khaleelulla Khan, Mayr, Christian, Subramoney, Anand, Kappel, David

arXiv.org Artificial Intelligence

Handling sparse and unstructured geometric data, such as point clouds or event-based vision, is a pressing challenge in the field of machine vision. Recently, sequence models such as Transformers and state-space models entered the domain of geometric data. These methods require specialized preprocessing to create a sequential view of a set of points. Furthermore, prior works involving sequence models iterate geometric data with either uniform or learned step sizes, implicitly relying on the model to infer the underlying geometric structure. In this work, we propose to encode geometric structure explicitly into the parameterization of a state-space model. State-space models are based on linear dynamics governed by a one-dimensional variable such as time or a spatial coordinate. We exploit this dynamic variable to inject relative differences of coordinates into the step size of the state-space model. The resulting geometric operation computes interactions between all pairs of N points in O(N) steps. Our model deploys the Mamba selective state-space model with a modified CUDA kernel to efficiently map sparse geometric data to modern hardware. The resulting sequence model, which we call STREAM, achieves competitive results on a range of benchmarks from point-cloud classification to event-based vision and audio classification. STREAM demonstrates a powerful inductive bias for sparse geometric data by improving the PointMamba baseline when trained from scratch on the ModelNet40 and ScanObjectNN point cloud analysis datasets. It further achieves, for the first time, 100% test accuracy on all 11 classes of the DVS128 Gestures dataset.


What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly

Hanika, Tom, Hille, Tobias

arXiv.org Artificial Intelligence

Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized correlation dimension. In the present work we revisit their results and contrast them with a concept based notion of intrinsic dimension (ID) recently introduced for geometric data sets. To do this, we present a novel approximation for this ID that is based on computing concepts only up to a certain support value. We demonstrate and evaluate our approximation using all available datasets from Tatti et al., which have between 469 and 41271 extrinsic dimensions.


Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching

Liu, Shengchao, Guo, Hongyu, Tang, Jian

arXiv.org Artificial Intelligence

Molecular representation pretraining is critical in various applications for drug and material discovery due to the limited number of labeled molecules, and most existing work focuses on pretraining on 2D molecular graphs. However, the power of pretraining on 3D geometric structures has been less explored. This is owing to the difficulty of finding a sufficient proxy task that can empower the pretraining to effectively extract essential features from the geometric structures. Motivated by the dynamic nature of 3D molecules, where the continuous motion of a molecule in the 3D Euclidean space forms a smooth potential energy surface, we propose GeoSSL, a 3D coordinate denoising pretraining framework to model such an energy landscape. Further by leveraging an SE(3)-invariant score matching method, we propose GeoSSL-DDM in which the coordinate denoising proxy task is effectively boiled down to denoising the pairwise atomic distances in a molecule. Our comprehensive experiments confirm the effectiveness and robustness of our proposed method.


CupNet -- Pruning a network for geometric data

Heese, Raoul, Morand, Lukas, Helm, Dirk, Bortz, Michael

arXiv.org Machine Learning

The optimization of production processes can benefit from machine learning methods that incorporate domain knowledge and data from numerical simulations [1]. Typically, such methods aim to model relations between process parameters and the resulting product. In this manuscript, we consider an example from the field of deep drawing, a sheet metal forming process in which a sheet metal blank is drawn into a forming die by mechanical action. Specifically, we study the prediction of product geometries in a cup drawing process based on data from finite element simulations [2].


Alec Jacobson: Geometry Processing in The Wild CMU RI Seminar

Robohub

Abstract: "Geometric data abounds, but our algorithms for geometry processing are failing. Whether from medical imagery, free-form architecture, self-driving cars, or 3D-printed parts, geometric data is often messy, riddled with "defects" that cause algorithms to crash or behave unpredictably. The traditional philosophy assumes geometry is given with 100% certainty and that algorithms can use whatever discretization is most convenient. As a result, geometric pipelines are leaky patchworks requiring esoteric training and involving many different people. Instead, we adapt fundamental mathematics to work directly on messy geometric data. As an archetypical example, I will discuss how to generalize the classic formula for determining the inside from the outside of a curve to messy representations of a 3D surface. This work helps keep the geometry processing pipeline flowing, as validated on our large-scale geometry benchmarks. Our long term vision is to replace the current leaky geometry processing pipeline with a robust workflow where processing operates directly on real geometric data found "in the wild". To do this, we need to rethink how algorithms should gracefully degrade when confronted with imprecision and uncertainty. Our most recent work on differentiable rendering and geometry-based adversarial attacks on image classification demonstrates the potential power of this philosophy."